Tabulation-Based 5-Independent Hashing with Applications to Linear Probing and Second Moment Estimation
نویسندگان
چکیده
In the framework of Carter and Wegman, a k-independent hash function maps any k keys independently. It is known that 5independent hashing provides good expected performance in applications such as linear probing and second moment estimation for data streams. The classic 5-independent hash function evaluates a degree 4 polynomial over a prime field containing the key domain [n] = {0, . . . , n − 1}. Here we present an efficient 5-independent hash function that uses no multiplications. Instead, for any parameter c, we make 2c − 1 lookups in tables of size O(n1/c). In experiments on different computers, our scheme gained factors 1.8 to 10 in speed over the polynomial method. We also conducted experiments on the performance of hash functions inside the above applications. In particular, we give realistic examples of inputs that make the most popular 2-independent hash function perform quite poorly. This illustrates the advantage of using schemes with provably good expected performance for all inputs.
منابع مشابه
Appendix for Tabulation Based 4-Universal Hashing with Applications to Second Moment Estimation
متن کامل
Tabulation Based 5-Universal Hashing and Linear Probing
Previously [SODA’04] we devised the fastest known algorithm for 4-universal hashing. The hashing was based on small pre-computed 4-universal tables. This led to a five-fold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the pre-computed tables are made 5-universal, then the hash value becomes 5universal without any other change to the comp...
متن کاملApproximately Minwise Independence with Twisted Tabulation
A random hash function h is ε-minwise if for any set S, |S| “ n, and element x P S, Prrhpxq “ minhpSqs “ p1 ̆ εq{n. Minwise hash functions with low bias ε have widespread applications within similarity estimation. Hashing from a universe rus, the twisted tabulation hashing of Pǎtraşcu and Thorup [SODA’13] makes c “ Op1q lookups in tables of size u1{c. Twisted tabulation was invented to get good ...
متن کاملLinear Probing with 5-wise Independence
Hashing with linear probing dates back to the 1950s, and is among the most studied algorithms for storing (key,value) pairs. In recent years it has become one of the most important hash table organizations since it uses the cache of modern computers very well. Unfortunately, previous analyses rely either on complicated and space consuming hash functions, or on the unrealistic assumption of free...
متن کاملPractical Hash Functions for Similarity Estimation and Dimensionality Reduction
Hashing is a basic tool for dimensionality reduction employed in several aspects of machine learning. However, the perfomance analysis is often carried out under the abstract assumption that a truly random unit cost hash function is used, without concern for which concrete hash function is employed. The concrete hash function may work fine on sufficiently random input. The question is if they c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- SIAM J. Comput.
دوره 41 شماره
صفحات -
تاریخ انتشار 2012